System and method for relational representation of hierarchical data

ABSTRACT

A technique for representing the structure of hierarchically-organized data in a non-hierarchical data structure, such as a relation. The hierarchically-organized data is represented as a tree, and each node in the tree is assigned a position identifier that represents both the depth level of the node within the hierarchy, and its ancestor/descendant relationship to other nodes. The data represented by each node, as well as its position identifier, is stored in a row of a relational database, thereby capturing the hierarchical structure of the data in such relational database. A technique is provided for the compressed storage of position identifiers in a format that allows an efficient bytewise comparison of position identifiers to determine relative order and ancestry.

FIELD OF THE INVENTION

[0001] The present invention relates generally to the field ofcomputing. More particularly, the invention relates to a method ofstoring hierarchically-organized data, such as eXtensible MarkupLanguage (XML) data, in a non-hierarchical data structure such as arelation.

BACKGROUND OF THE INVENTION

[0002] Much data is organized in a “hierarchical” format—that is, aformat that permits the specification of a hierarchy of structures andsub-structures. For example, eXtensible Markup Language (XML) is apopular format for representing data, and XML supports a hierarchicalformat in the sense that data may be “bracketed” with tags, and tags maybe nested within other tags.

[0003] While it is common to organize information hierarchically, themost common means of storage is a database which stores data inrelational tables. Relational tables are not hierarchical; they are“flat.” Relational databases store rows of columnar data; the rows maybe placed in an order, but a relation has no inherent hierarchicalstructure. It would be advantageous to representhierarchically-organized data (such as an XML document) in a “flat” datastructure (such as a relation), such that hierarchical structure of thedata can be captured and preserved in the flat data structure.

[0004] In view of the foregoing, there is a need for a system and methodfor representing hierarchical data that overcomes the drawbacks of theprior art.

SUMMARY OF THE INVENTION

[0005] The present invention provides a technique for representinghierarchical data in a non-hierarchical data structure. Hierarchicaldata (e.g., XML data) can be viewed as having a “tree” structure. Thatis, XML data is bracketed by a series of tags; the top level tagcorresponds to the root node of the tree, “sub”-tags embedded within thehighest-level tag correspond to the children of the root, and so on.Moreover, among the children of a given node in the tree, an order maybe defined based on the order in which the various tags (and sub-tags,sub-sub-tags, etc.) appear in the XML document.

[0006] This structure may be captured with a position-identifier schemereferred to herein as “ORDPATH.” A position-identifier is a labelassociated with each node represented in hierarchical data. The positionidentifier captures position information about the node that representsboth the level in the hierarchy at which the node appears, as well asthe node's relationship to its ancestors and descendants. For example,the root node of a tree may have the position identifier “1”, thechildren may have the identifiers “1.1”, “1.3”, and “1.5” (skipping theeven numbered values, for a reason described below), the children ofnode “1.1” may be numbered “1.1.1”, “1.1.3”, etc. This type ofposition-identifier numbering scheme allows the hierarchical structureof the tree to be represented in the sense that, for any pair of nodes,it is possible to determine which node appears leftmost (or rightmost)in the tree (assuming a document pre-order traversal), and whether onenode is an ancestor (or descendant) of the other.

[0007] A technique is provided whereby certain numbers (e.g., all evennumbers) are used for “indirection,” so that a new child node can beinserted between existing nodes after the initial nodes have been storedaccording to the above-described scheme, and such that the positionidentifiers continue to capture the hierarchical structure of the tree.Moreover, a technique is provided for representing position identifiersin a compressed format that allows an efficient byte-wise comparison ofposition identifiers to determine order and ancestry. Additionally, ahybrid numbering scheme is provided that, in certain circumstances,allows shorter-length position identifiers, while still supportinginsertion of nodes and the determination of ancestry relationships.

[0008] Other features of the invention are described below.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The foregoing summary, as well as the following detaileddescription of preferred embodiments, is better understood when read inconjunction with the appended drawings. For the purpose of illustratingthe invention, there is shown in the drawings exemplary constructions ofthe invention; however, the invention is not limited to the specificmethods and instrumentalities disclosed. In the drawings:

[0010]FIG. 1 is a block diagram of an exemplary computing environment inwhich aspects of the invention may be implemented;

[0011]FIG. 2 is a diagram of exemplary data which is hierarchicallyorganized;

[0012]FIG. 3 is block diagram of an exemplary data structure whichrepresents the hierarchically-organized data of FIG. 2;

[0013]FIG. 4 is a block diagram of an exemplary non-hierarchical datastructure which represents the hierarchically-organized data of FIG. 2;

[0014]FIG. 5 is a block diagram of an exemplary tree data structure,with tree nodes assigned position identifiers in accordance with anaspect of the invention;

[0015]FIG. 6 is a block diagram of the exemplary tree data structure ofFIG. 5, with new data inserted therein in accordance with aspects of theinvention;

[0016]FIG. 7 is a block diagram of an exemplary data structure which maybe used to represent hierarchical position data in accordance with apreferred embodiment of the invention; and

[0017]FIG. 8 is a flow diagram of an exemplary process of comparingposition information for two nodes in a hierarchical data structure.

DETAILED DESCRIPTION OF THE INVENTION

[0018] Overview

[0019] In many cases, data is hierarchically organized. Data written ineXtensible Markup Language (XML), where portions of the data may bedelimited by a series of nested tags, is a case in point. The presentinvention provides a technique for storing such hierarchical data in anon-hierarchical data structure such as a relation, while stillmaintaining information about the hierarchical structure of the data.Thus, hierarchically-organized data may be stored by efficient means,such as a commercial database system.

[0020] Exemplary Computing Environment

[0021]FIG. 1 illustrates an example of a suitable computing systemenvironment 100 in which the invention may be implemented. The computingsystem environment 100 is only one example of a suitable computingenvironment and is not intended to suggest any limitation as to thescope of use or functionality of the invention. Neither should thecomputing environment 100 be interpreted as having any dependency orrequirement relating to any one or combination of components illustratedin the exemplary operating environment 100.

[0022] The invention is operational with numerous other general purposeor special purpose computing system environments or configurations.Examples of well known computing systems, environments, and/orconfigurations that may be suitable for use with the invention include,but are not limited to, personal computers, server computers, hand-heldor laptop devices, multiprocessor systems, microprocessor-based systems,set top boxes, programmable consumer electronics, network PCs,minicomputers, mainframe computers, distributed computing environmentsthat include any of the above systems or devices, and the like.

[0023] The invention may be described in the general context ofcomputer-executable instructions, such as program modules, beingexecuted by a computer. Generally, program modules include routines,programs, objects, components, data structures, etc. that performparticular tasks or implement particular abstract data types. Theinvention may also be practiced in distributed computing environmentswhere tasks are performed by remote processing devices that are linkedthrough a communications network or other data transmission medium. In adistributed computing environment, program modules and other data may belocated in both local and remote computer storage media including memorystorage devices.

[0024] With reference to FIG. 1, an exemplary system for implementingthe invention includes a general purpose computing device in the form ofa computer 110. Components of computer 110 may include, but are notlimited to, a processing unit 120, a system memory 130, and a system bus121 that couples various system components including the system memoryto the processing unit 120. The system bus 121 may be any of severaltypes of bus structures including a memory bus or memory controller, aperipheral bus, and a local bus using any of a variety of busarchitectures. By way of example, and not limitation, such architecturesinclude Industry Standard Architecture (ISA) bus, Micro ChannelArchitecture (MCA) bus, Enhanced ISA (EISA) bus, Video ElectronicsStandards Association (VESA) local bus, and Peripheral ComponentInterconnect (PCI) bus (also known as Mezzanine bus).

[0025] Computer 110 typically includes a variety of computer readablemedia. Computer readable media can be any available media that can beaccessed by computer 110 and includes both volatile and nonvolatilemedia, removable and non-removable media. By way of example, and notlimitation, computer readable media may comprise computer storage mediaand communication media. Computer storage media includes both volatileand nonvolatile, removable and non-removable media implemented in anymethod or technology for storage of information such as computerreadable instructions, data structures, program modules or other data.Computer storage media includes, but is not limited to, RAM, ROM,EEPROM, flash memory or other memory technology, CDROM, digitalversatile disks (DVD) or other optical disk storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other medium which can be used to store the desired informationand which can accessed by computer 110. Communication media typicallyembodies computer readable instructions, data structures, programmodules or other data in a modulated data signal such as a carrier waveor other transport mechanism and includes any information deliverymedia. The term “modulated data signal” means a signal that has one ormore of its characteristics set or changed in such a manner as to encodeinformation in the signal. By way of example, and not limitation,communication media includes wired media such as a wired network ordirect-wired connection, and wireless media such as acoustic, RF,infrared and other wireless media. Combinations of any of the aboveshould also be included within the scope of computer readable media.

[0026] The system memory 130 includes computer storage media in the formof volatile and/or nonvolatile memory such as read only memory (ROM) 131and random access memory (RAM) 132. A basic input/output system 133(BIOS), containing the basic routines that help to transfer informationbetween elements within computer 110, such as during start-up, istypically stored in ROM 131. RAM 132 typically contains data and/orprogram modules that are immediately accessible to and/or presentlybeing operated on by processing unit 120. By way of example, and notlimitation, FIG. 1 illustrates operating system 134, applicationprograms 135, other program modules 136, and program data 137.

[0027] The computer 110 may also include other removable/non-removable,volatile/nonvolatile computer storage media. By way of example only,FIG. 1 illustrates a hard disk drive 140 that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive 151that reads from or writes to a removable, nonvolatile magnetic disk 152,and an optical disk drive 155 that reads from or writes to a removable,nonvolatile optical disk 156, such as a CD ROM or other optical media.Other removable/non-removable, volatile/nonvolatile computer storagemedia that can be used in the exemplary operating environment include,but are not limited to, magnetic tape cassettes, flash memory cards,digital versatile disks, digital video tape, solid state RAM, solidstate ROM, and the like. The hard disk drive 141 is typically connectedto the system bus 121 through an non-removable memory interface such asinterface 140, and magnetic disk drive 151 and optical disk drive 155are typically connected to the system bus 121 by a removable memoryinterface, such as interface 150.

[0028] The drives and their associated computer storage media discussedabove and illustrated in FIG. 1, provide storage of computer readableinstructions, data structures, program modules and other data for thecomputer 110. In FIG. 1, for example, hard disk drive 141 is illustratedas storing operating system 144, application programs 145, other programmodules 146, and program data 147. Note that these components can eitherbe the same as or different from operating system 134, applicationprograms 135, other program modules 136, and program data 137. Operatingsystem 144, application programs 145, other program modules 146, andprogram data 147 are given different numbers here to illustrate that, ata minimum, they are different copies. A user may enter commands andinformation into the computer 20 through input devices such as akeyboard 162 and pointing device 161, commonly referred to as a mouse,trackball or touch pad. Other input devices (not shown) may include amicrophone, joystick, game pad, satellite dish, scanner, or the like.These and other input devices are often connected to the processing unit120 through a user input interface 160 that is coupled to the systembus, but may be connected by other interface and bus structures, such asa parallel port, game port or a universal serial bus (USB). A monitor191 or other type of display device is also connected to the system bus121 via an interface, such as a video interface 190. In addition to themonitor, computers may also include other peripheral output devices suchas speakers 197 and printer 196, which may be connected through anoutput peripheral interface 190.

[0029] The computer 110 may operate in a networked environment usinglogical connections to one or more remote computers, such as a remotecomputer 180. The remote computer 180 may be a personal computer, aserver, a router, a network PC, a peer device or other common networknode, and typically includes many or all of the elements described aboverelative to the computer 110, although only a memory storage device 181has been illustrated in FIG. 1. The logical connections depicted in FIG.1 include a local area network (LAN) 171 and a wide area network (WAN)173, but may also include other networks. Such networking environmentsare commonplace in offices, enterprise-wide computer networks, intranetsand the Internet.

[0030] When used in a LAN networking environment, the computer 110 isconnected to the LAN 171 through a network interface or adapter 170.When used in a WAN networking environment, the computer 110 typicallyincludes a modem 172 or other means for establishing communications overthe WAN 173, such as the Internet. The modem 172, which may be internalor external, may be connected to the system bus 121 via the user inputinterface 160, or other appropriate mechanism. In a networkedenvironment, program modules depicted relative to the computer 110, orportions thereof, may be stored in the remote memory storage device. Byway of example, and not limitation, FIG. 1 illustrates remoteapplication programs 185 as residing on memory device 181. It will beappreciated that the network connections shown are exemplary and othermeans of establishing a communications link between the computers may beused.

[0031] Hierarchically-Organized Data

[0032]FIG. 2 shows an example of hierarchically-organized data. Forexemplary purposes, hierarchically-organized data 200 is shown inextensible Markup Language (XML). However, it will be understood thatXML is merely one way of representing hierarchical data, and that theinvention applies to hierarchical data generally, regardless of whethersuch data is represented in XML.

[0033] Exemplary hierarchical data 200 comprises a plurality of tags201, 203-209. Hierarchical data 200 may also include one or moreattribute definitions 202. In XML, “tags” are objects that being with‘<’ and end with ‘>’, with the user-assigned tag name directly followingthe ‘<’. Hierarchical data 200 further includes data items 221-225. Dataitem 221 is referred to as an “attribute value”—i.e., the value of theattribute ISBN. Data items 222-225 are referred to as “element values.”For example, the data 222 delimited by the tag pair “<TITLE>” and“</TITLE>” is an “element value.” In general, data items 221-225represent some type of information, and tags 201, 203-209, as well asattribute definition 202, help to impose a type of hierarchicalstructure on that information. In exemplary hierarchically-organizeddata 200, the information represented by the data is a textual narrativeabout tree frogs.

[0034] The structure imposed upon the textual information is defined bytags and attribute definitions. For example, at the highest level, theinformation is a document, identified by the pair of tags 201 and 209.(By convention in XML, levels of organization are delimited by theconvention that an element of the hierarchy begins with a token in theform of “<TAG>” and ends with a token of the form “</TAG>”; by thisconvention, two such tags are generally required to contain an element.)In exemplary data 200, the document includes a title and one or moresections. The title is delimited by the pair of tags 203 and 204. Asection is delimited by the pair of tags 205 and 208. A section may havepassages of bold text, which, in this example are delimited by tags 206and 207. The hierarchical structure of data 200 can be appreciated fromthe tags: that is, bold text is a component of a section, and a sectionand a title are a component of a document. Some data in the hierarchyrepresents an attribute value and does not require delimiters. Forexample, a document in this example also includes an attribute ISBNnumber (reference numeral 221), which, in this case, is assigned as anattribute to ISBN attribute 202.

[0035] Tree Representation of Hierarchically Organized Data

[0036]FIG. 3 shows a tree data structure 300 that represents thehierarchically-organized data 200 depicted in FIG. 2. Tree 300 comprisesa plurality of nodes 302-314. The hierarchical structure of data 200 isreadily apparent in tree 300. Node 302, which is the highest-level nodein the tree, represents the “DOCUMENT” tag, which is the highest-leveltag structure in data 200. As noted above, the components of a“document,” in this example, are an ISBN number, a title, and a section.These components are represented by child nodes 304, 306, and 308. Itwill be noted that, among nodes 304, 306, and 308, there is aleft-to-right ordering, which corresponds to the order in which thosecomponents appear in hierarchically-organized data 200. That is, in data200, the ISBN number appears before the title, and the title appearsbefore the section. Thus, in tree 300, node 304 representing the ISBNnumber is the leftmost child of document node 302; node 306 representingthe title is to the right of ISBN node 304, and node 308 representingthe section is to the right of title node 306.

[0037] Section node 308 has child nodes 310, 312 and 314, representingthe various components of text in the section. Node 310 represents thetext “All right thinking people”; node 312 represents the bold text“love”; and node 314 represents the text “tree frogs.” In this example,the section text has been broken up into three components, because it isconvenient to represent the bold text between tags 206 and 207 (shown inFIG. 2) as a child node of the “section” that contains that tag pair. Byrepresenting the text on either side of the bold text as nodes 310 and314, the ordering among the bold text and the two pieces of non-boldtext is represented in the tree through the left-to-right ordering amongnodes 310, 312, and 314. If the text delimited by the <SECTION> AND</SECTION> tags were short and had no tags nested within it, then thesection text could simply be represented as an element value in node 308without the use of child nodes; if the section text is particularlylong, or if (as in the example of FIG. 3) the section contains one ormore nested tags, then the various pieces of text can be represented aschild nodes of node 308.

[0038] Each node in tree 300 is assigned a position identifier 325referred to as an “ORDPATH.” Position identifiers 325 represent both thehierarchical and left-to-right position in tree 300 of a given node.That is, given the position identifiers 325 of any two nodes in tree300, it is possible to determine whether one of the nodes is an ancestor(or descendant) of the other, and, if so, how many “generations” or“levels” separate the nodes. Moreover, it is possible to determine whichof the nodes appears to the left (or right) of the other.

[0039] The “ORDPATH” shown in FIG. 3 is an exemplary numbering schemefor position identifiers 325. In this numbering scheme, node 302 isassigned the position identifier “1”. All child nodes of node 302 areassigned position identifiers that begin with “1”—i.e., “1.1” for node304, “1.3” for node 306, and “1.5” for node 308. Similarly, since node308 has position identifier “1.5”, all child nodes of node 308 haveposition identifier that begin with “1.5”. Thus, nodes 310, 312, and 314have position identifiers “1.5.1”, “1.5.3”, and “1.5.5”, respectively.

[0040] Information can be derived from these position identifiers 325 inthe following manner. First, the number of dot-separated numbers inposition identifiers 325 identifies the “depth” of a node within tree300. That is, the position identifier of node 312, has threedot-separated numbers (i.e., “1”, “5”, and “3”), and thus is at thethird level down from the highest level. These three dot-separatednumbers are referred to in precise nomenclature as “component values ofan ORDPATH.” Node 302 has only one component value in its positionidentifier (i.e., “1), and thus is at the first level in tree 300.

[0041] Second, it can be determined whether an ancestor/descendantrelationship exists between two nodes. Specifically, the positionidentifier numbering scheme shown in FIG. 3 obeys the property that afirst node is an ancestor of a second node if and only if the firstnode's position identifier is a prefix of the second node's positionidentifier. Thus, node 306 cannot be an ancestor of node 312, becausenode 306's position identifier (i.e., “1.3”) is not a prefix of node312's position identifier (i.e., “1.5.3”). However, it can readily bedetermined that node 308 is an ancestor of node 312, because node 308'sposition identifier “1.5” is, in fact, a prefix of “1.5.3.” (It will beobserved that the highest-level node is labeled “1”, which indicatesthat the node has no ancestor.)

[0042] Third, it can be determined how many generations separate twonodes, simply by comparing how many numbers are in the two nodes'respective position identifiers. Thus, it is possible to determine fromposition identifiers 325 that node 302 is two levels above node 310,because node 302 has position identifier “1” (one component value),whereas node 310 has position identifier “1.5.1” (three componentvalues).

[0043] Fourth, it can be determined which of the two nodes precedes theother in a left-to-right preorder traversal of tree 300. For example, itcan be determined that node 306 appears before node 310 by comparing thenumbers in their position identifiers until a pair of correspondingnumbers is encountered that do not match. In the case of nodes 306 and310, their respective first numbers to match (i.e., both are a “1”), buttheir second numbers do not match (i.e., node 306 has a “3” in itssecond component value, but node 310 has a “5”). Since 3 is less than 5,node 306 precedes node 310 in the left-to-right preorder of tree 300.(As discussed below, a position identifier can be represented in aformat that allows relative order of two nodes to be determined by amore efficient comparison method.)

[0044] It will be appreciated from the foregoing that the hierarchicalstructure of tree 300 (and thus of the hierarchically-organized data 200that tree 300 represents) is captured by position identifiers 325. Itwill further be appreciated that the ORDPATH identifiers shown in FIG. 3are merely exemplary, and that other types of position identifiers (suchas the “ORDKEY” position identifiers shown below) may be used torepresent the hierarchical structure of a tree. Moreover, positionidentifiers need not be represented by numbers, but rather can berepresented using any values selected from a space on which an orderingcan be meaningfully defined.

[0045]FIG. 4 shows a relation 400 that representshierarchically-organized data 200, using the position numbering schemeof FIG. 3 to capture the hierarchy. A relation is a table comprisingrows and columns. Each column has a name, which is sometimes referred toas an “attribute” (which is different from the “attributes” that mayform part of an XML structure, as discussed above). Each row has a valuefor each column in the relation. A row, in some cases, may be uniquelyidentified by the values of one or more columns, in which case the oneor more columns is referred to as a “key” for the relation. It is knownin the art that a relation is a useful data structure for the storageand organization of information. Relations are used for the storage ofinformation in various commercial database systems, such as MICROSOFTSQL Server.

[0046] Relation 400 comprises a plurality of rows in a relation (orrelational table) 412-424. Each row represents a node in tree 300. Forexample, row 402 represents node 302, row 404 represents node 304, etc.Relation 400 has a plurality of columns 432-438. Column 432 has the name“ORDPATH,” and represents the ORDPATH position identifiers 325 of thenodes of tree 300. Column 434 contains an integer that identifies thegiven name for each of the nodes in tree 300 (“DOCUMENT”, “ISBN”,“TITLE”, etc.), by reference to the integer primary keys of a separatetable constructed to contain all these names. Column 436 represents thetype of node identified in column 434 (ELEMENT, ATTRIBUTE, VALUE, etc.).Column 438 contains, for each row, the value stored at the node. It willbe observed that, in some cases, a column may have a null value. Forexample, the nodes represented by rows 410 and 414 have no tag names incolumn 434, as it will be recalled that the nodes represented by theserows are not generated directly by the presence of tags in data 200, butrather by the decomposition of the text information delimited by the“SECTION” tags. Similarly, row 408, which represents the “SECTION” nodeof data 200, has no value in column 438, because the information in the“SECTION” tag is represented by the three child nodes 310, 312, and 314,whose values are stored in rows 410, 412, and 414, respectively.

[0047] Thus, data 200 and its hierarchical structure is captured inrelation 400, even though relation 400 is, itself, “flat” (i.e.,non-hierarchical). Using the ORDPATH position identifier stored incolumn 432, the hierarchical structure of data 200 is readilydiscernible from relation 400 and can be reconstructed from relation400. Furthermore, with ORDPATH used as the (clustered) primary key ofrelation 400, the rows will actually sit on disk in the appropriatedocument order, making searches in a range within the document moreefficient.

[0048] Inserting Data into a Hierarchical Structure

[0049]FIGS. 5 and 6 show how data can be inserted (or “careted”) into ahierarchical data structure, while still maintaining the valuableproperties of the position identifier numbering scheme described above.In FIG. 5, a tree 500 is shown, whose hierarchical structure is capturedby a set of position numbers. In the example of FIG. 5, node 502 hasposition number “1”, and node 502's child nodes 504, 506, and 508 haveposition numbers “1.1”, “1.3”, and “1.5”, respectively. It will beobserved that only odd numbers are used in the position numbers fornodes 502-508; in a preferred embodiment, even numbers are explicitlyomitted from the numbering scheme. A technique is described below thatuses the even numbers for insertion of nodes. However, in an alternativeembodiment, nodes can be numbered with consecutive integers as long asit is not necessary to perform insertions in a manner that captures theorder of a new node relative to its siblings.

[0050] It may be necessary, after tree 500 has been initiallyconstructed, to insert nodes 602 and 604 into tree 500, such that nodes602 and 604 are child nodes of node 502. Moreover, in the left-to-rightordering among the nodes of tree 500, nodes 602 and 604 may be placedbetween nodes 504 and 506. In this example, nodes 602 and 604 areassigned position numbers “1.2.1” and “1.2.3”, respectively, nowbecoming sibling nodes to the right of 504 and to the left of 506. Inother words, even number component values are skipped in the initialnumbering of the nodes, and are reserved for insertions; the evennumbered component values are then ignored in terms of component depthin the tree, becoming siblings of nodes with the same number of oddnumbered components. This scheme may be carried out recursively. Forexample, after nodes 602 and 604 have been inserted into tree 500, itmay become necessary to insert node 606 as a further child of node 502between nodes 602 and 604. Node 606, in this example, receives positionnumber “1.2.2.1.” This numbering scheme can be carried out for anarbitrary number of insertions, although it may require usingarbitrarily long position identifiers. Some insertions on the left orthe right of all sibling nodes that are children of a given parent willnot require any even numbered components (although insertions to theleft of a group of siblings may require a negative odd number—e.g., node608, which is inserted to the left of the node having position number“1.1”, has position number “1.−1”.) If node 610 later needs to beinserted in between nodes 608 and 504, the new node 610 will be numbered“1.0.1 (i.e., “0” is the even number between 1 and −1).

[0051]FIGS. 5 and 6 show the use of even and odd numbers such that oddnumbers are used to represent nodes, and even numbers are used torepresent insertion points. However, it will be appreciated that the useof odd and even numbers is merely exemplary. What is significant aboutthe odd/even numbering scheme is that: (1) it is always possible todistinguish the numbers used for numbering nodes (odd nodes in thisexample) from numbers that are used to indicate that a node has beeninserted (even nodes in this example); and (2) there is always a placeto insert between two nodes, because the odd/even numbering schemeensures that a number is “skipped” between any two nodes. However, itwill be understood that any numbering scheme that obeys these propertiescan be used without departing from the spirit and scope of theinvention. For example, nodes could initially be assigned numbersdivisible by three (e.g., 3, 6, 9, etc.), with the numbers in betweenthem (e.g., 1, 2, 4, 5, 7, 8, etc.) being used for insertions. A salientfeature is that the values used to represent a position are selectedfrom a space of discrete values on which an order can be meaningfullydefined (e.g., hexidecimal integers, n-letter words in the Romanalphabet which can be placed in lexical order, etc.), and that onlyvalues that are non-adjacent with respect to the ordering are used forthe initial assignment of identifiers (so that the unused values inbetween the non-adjacent values are available for insertions).

[0052] A Preferred Structure for Representing Position Identifiers

[0053] Position identifiers 325 discussed above are represented by asequence of dot-separated numbers called “components”; while the formatof the component values is ambiguous, it gives the impression of being atext string of digits While such a representation operates to capturethe structure of hierarchically-organized data in a “flat”(non-hierarchical) data structure, the following is a preferredstructure for representing such position identifiers. The structuredescribed below allows position identifiers to be stored and comparedwith relatively greater efficiency than the dot-separated numbersdescribed above.

[0054]FIG. 7 shows a preferred data structure 700 for representing aposition identifier. Data structure 700 preferably comprises a bitlength field 702, a plurality of ordinal length fields 704, a pluralityof ordinal value fields 706, and zero to seven “wasted” bits 708, asexplained below. Bit length field 702 stores the aggregate number ofbits in the plurality of ordinal length fields 704 and ordinal valuefields 706. Preferably, the number of bits in the bit length field 702itself is a multiple of eight (an integral number of bytes) and is notcounted in bit length field 702.

[0055] In essence, each pair of ordinal length field 704 and ordinalvalue field 706 (e.g., the pair of L_(i) and O_(i), i=0 to k) representsone of the dot-separated numbers in the position identifier describedabove. Specifically, O_(i) corresponds to the value of a dot-separatedcomponent value, and L_(i) is the length of O_(i). The aggregate numberof bits in ordinal length fields 704 and ordinal value fields 706 mightnot be divisible by eight (that is, these fields might not add up to awhole number of bytes). Since data structures are typically embodied asa sequence of whole bytes, there may be some “wasted” bits 708 that arenot used to represent any L_(i)/O_(i) pair, but are present in datastructure 700 to round out the number of bytes to a whole number.

[0056] The following is a more detailed description of data structure700. In each Ordinal component L_(i)/O_(i) of a position identifier,L_(i) gives the length in bits of the succeeding O_(i). The lengthvalues L_(i) come from a set of bitstrings that have a “prefixproperty,” meaning that there is a way to parse through an L_(i)bitstring and recognize both the value represented and when thebitstring ends. In essence, the “prefix” property means that nolegitimate value for L_(i) is a prefix for any other legitimate value ofL_(i). One way to create numbers that obey the prefix property is tocreate a binary tree, where each leaf node in the tree is associatedwith a binary number that represents the path from the root node to theleaf node (with “0” representing a movement from a node to its leftchild, and “1” representing a movement from a node to its right child).In this case, the bit sequence associated with the leaf node obeys theprefix property, and this bit sequence is interpreted as the numberL_(i). This method (and others) of creating numbers that obey the prefixproperty are known in the art and thus are not discussed at lengthherein. Given the length represented by L_(i), the following O_(i) valueis of a known length, so it is known where a given L_(i)/O_(i) pair endsand the next one (i.e., L_(i+1)/O_(i+1)) begins. The particular lengthsrepresented by the L_(i) values are preferably chosen to minimize theaverage length of the L_(i)/O_(i) component for the expected number ofchildren at any level of the underlying hierarchical structure that isto be captured in the position identifiers.

[0057] The following table shows an exemplary set of L_(i) values, andthe prefix-property-obedient bit sequences that represent them: L_(i)Bit sequence −48 00000001 −32 0000001 −16 000001 −12 000010 −8 000011 −600010 −4 00011 −3 001 3 01 4 100 6 101 8 1100 12 1101 16 1110 32 1111048 11111

[0058] Note that the negative L_(i) values, −48 to −3 in fact representordinal lengths of 3 to 48, but indicate that the O_(i) values will beinterpreted as negative numbers, whereas the L_(i) values 3 to 48 willpreface ordinals O_(i) that represent positive numbers. (As discussedabove, it may be necessary to represent certain dot-separated numbers(corresponding to the O_(i) in this example) as negative numbers, in thecase where information is inserted into the hierarchy to the left ofexisting data. The following discussion concentrates on the positiveL_(i) values.

[0059] When position identifiers are assigned to nodes in a hierarchy,odd ordinal numbers are used for successive sibling children of anynode, (e.g., 1, 3, 5, 7, . . . ); as discussed above, this use of oddnumbers facilitates the insertion of information into an existingstructure by using even numbers to represent the insertion. In thisexample, lengths of zero, one and two are not assignedprefix-property-obedient bit strings; thus, all O_(i) have lengths of atleast three bits. Using an L_(i) value of 3, it is possible to representfour odd ordinals values (i.e., 1, 3, 5, 7, corresponding to the bitstrings 001, 011, 101, and 111) in a component O_(i) with LengthL_(i)=3. For example, the following is a representation of the positionidentifier “3.1.7.5” in the format of FIG. 7: 00010100 01 011 01 001 01111 01 101 0000 Bitlen = 20 L₀ = 3 O₀ = 3 L₁ = 3 O₁ = 1 L₂ = 3 O₂ = 7 L₃= 3 O₃ = 5 W

[0060] Preferably, the minimum length L_(i) is used to represent anordinal value O_(i), so the value 9 (or 8) requires L_(i)=4. In thiscase, it is not necessary to have repetitive representations of O_(i)values—that is, there is no need to be able to represent the value 3both in a three-bit string and a four-bit string. Thus, for L_(i)=4, theordinal value 8 is represented by by 0000, 9 is represented by 0001,etc. Under this scheme, the following table shows the O_(i) values thatcan be represented by the various values of L_(I): Integers Representedin Map bit representation b_(m)b_(m−1)b_(m−2) . . . to value L_(i) O_(i)O_(i) 3  0 to 7 (2³ − 1) O_(i) value is integer represented by bitsb₂b₁b₀ 4  8 to 23 (2³ + 2⁴ − 1) O_(i) value is 8 + b₃b₂b₁b₀ 6 24 to 87(2³ + 2⁴ + 2⁶ − 1) O_(i) value is 24 + b₅b₄b₃b₂b₁b₀ 8, 12, 16, 88 to AtL_(i) = 48, O_(i) value is 88 + 2⁸ + 32, 48 88 + 2⁸ + 2¹² + . . . + 2⁴⁸− 1 2¹² + . . . + 2⁴⁸ − 1 + b₄₇ . . . b₀

[0061] The above table uses 48 as the highest possible value for L_(i),although it will be appreciated that a position numbering scheme can bedesigned with arbitrarily large L_(i) values.

[0062] Under the exemplary number scheme described above, the followingis a representation of the position identifier “7.99.1.17.87” inaccordance with the format of FIG. 7: 00100110 01 111 1100 00001011 01001 100 1001 101 111111 00 Bitlen = 38 L₀ = 3 O₀ = 7 L₁ = 8 O₁ = 99 L₂ =3 O₂ = 1 L₃ = 4 O₃ = 17 L₄ = 6 O₄ = 87 W (11 + 88) (9 + 8) (63 + 24)

[0063] L_(i) may be encoded separately for each position identifier andmight be different for components on the same level—even of siblingchildren, if, for example, one sibling has ordinal value of 7 in itsfinal component, and the next sibling has an ordinal value of 9. Becauseof this flexibility, a sequence of increasing ordinal numbers forchildren of a given node can be inserted at any time, with increasingL_(i) as needed. It is also possible to insert a sequence of decreasing(negative) ordinals using the “negative” L_(i) values shown above. Thefollowing table shows corresponding O_(i) values for those negativeL_(i) values. Map bit representation b_(m)b_(m−1)b_(m−2) . . . to valueL_(i) Integers Represented in O_(i) O_(i) −3 −1 to −8 = −(2³) O_(i)value is −8 + b₂b₁b₀ −4 −9 to −24 = −(2³ + 2⁴) O_(i) value is −24 +b₃b₂b₁b₀ −6 −25 to −88 = −(2³ + 2⁴ + 2⁶) O_(i) value is −88 +b₅b₄b₃b₂b₁b₀ −8, −12, . . . , −89 to −89 − 2⁸ − 2¹² − . . . − 2⁴⁸ −89 −2⁸ − 2¹² − . . .− 2⁴⁸ + b₄₇ . . . b₀ −48

[0064] The following is a representation of the position identifier“−7.99.1.−17.87” in accordance with the format of FIG. 7. 00101001 001001 1100 00001011 01 001 00011 0111 101 111111 0000000 Bitlen = 41 L₀ =−3 O₀ = −7 L₁ = 8 O₁ = 99 L₂ = 3 O₂ = 1 L₃ = −4 O₃ = −17 L₄ = 6 O₄ = 87W (11 + 88) (7 − 24) (63 + 24)

[0065] Comparison of Position Identifiers

[0066] When position identifiers are represented in the format of FIG.7, it is possible to compare two position identifiers using thealgorithm described below.

[0067] Consider two distinct ORDPATHs, X and X′ shown below X Bitlen L₀O₀ L₁ O₁ . . . L_(k) O_(k) . . . X′ Bitlen′ L′₀ O′₀ L′₁ O′₁ . . . L′_(k)O′_(k) . . .

[0068] It is possible to perform a byte-by-byte comparison of the X andX′ strings, starting after the Bitlen element, and running for thebyte-length of the shorter string (this is (min(Bitlen, Bitlen′)+7)/8bytes). If at some point during the comparison it is found that X<X′(that is, a given byte of X has a lower value than the correspondingbyte of X′), then X comes earlier in document order than X′. (This mayalso mean that X is an ancestor of X′ if X has the shorter byte-length.)

[0069] This type of comparison works for the following reason: if,during the byte-by-byte comparison, the first non-equality between X andX′ is discovered in an L_(i) value and L_(i)<L′_(i), then, as describedin the tables above, O_(i) is less than O′_(i),—that is, since O_(i)values do not have repetitive representations in different lengths, agreater length implies a greater O_(i) value range. As described abovein connection with the position identifier numbering scheme, a lowervalue for O_(i) (assuming that all previously values for O_(i) andO′_(i) have been equal) implies that O_(i) appears earlier than O′_(i)in document order. On the other hand, if the first difference between Xand X′ is discovered in an O_(i) value and O_(i)<O′_(i), then theordinal values are known directly (rather than by inference from theirlength), and their L_(i) values match, so it is again known that Xprecedes X′.

[0070] It should be noted that this document-order comparison works evenwhen wasted bits 708 are used in the comparison. In other words, theordinary byte-by-byte binary string comparison is sufficient fordocument-order determination. However, ancestry information may bedetermined along with document-order information, and in this case, thebyte containing the W bits needs special handling. If the wasted bits Wat the end of the shorter string (say it is X) contains one or more zerobits, then even if the final L_(k)/O_(k) of the shorter string comparesequal to the L′_(k)/O′_(k) of the longer string, W is likely to comparelow to the L_(i) that begins at this point in X′. In determiningancestry therefore, it is only necessary to compare the leftmostmin(Bitlen, Bitlen′) % 8 bits under an unsigned mask of the last byte,masking out W, and if there is an equal match for the full length of theshorter string X, then X is an ancestor of X′.

[0071] Using the above-described logic, a cmpordp(X, Y) function on twoORDPATHs X and Y can be created which returns (M, N). Where M and Nrepresent the following results:

[0072] If M<0, >0, or =0, then X is shorter, the same length, or longerthan Y, accordingly.

[0073] Ancestry can be determined according to the following rules:

[0074] X is an ancestor of Y if and only if N=0 and M<0;

[0075] X is the same node as Y if and only if N=0 and M=0;

[0076] Y is an ancestor of X if and only if N=0 and M>0;

[0077] Document order can be determined according to the followingrules:

[0078] X precedes Y in document order if and only if N<0 or (N=0 andM<0);

[0079] X is the same node as Y if and only if N=0 and M=0 (see same ruleabove regarding ancestry);

[0080] Y precedes X in document order if and only if N>0 or (N=0 andM>0).

[0081]FIG. 8 describes an exemplary process for carrying out thecmpordp(X, Y) function. At step 802, the bit lengths of X and Y (i.e.,bitlen fields 702, shown in FIG. 7) are compared. If the bit length of Xis less than the bit length of Y, then B is set to bitlen(X)/8, and M isset to −1 (step 804). (B represents the number of bytes of X and Y to becompared at step 810, as described below.) If the bit length of X isequal to the bit length of Y, then B is set to (bitlen(X)+7)/8, and M isset to 0 (step 806). If the bit length of X is greater than the bitlength of Y, then B is set to bitlen(Y)/8, and M is set to 1 (step 808).Regardless of whether decisional step 802 leads to step 804, 806, or808, the process continues to step 810.

[0082] At step 810, a byte-by-byte comparison is performed of the firstB bytes of X and Y, and the flow proceeds to either block 812, 814, or816 according to the result of the comparison. Specifically, bytes of Xand Y are compared from left to right until a byte is identified in Xwhose value is different from the corresponding byte in Y, or until allB bytes have been compared. If the identified byte in X is less than thecorresponding byte of Y, then the process continues to step 812, where Nis set to −1, and D is set to −1. (D represents a conclusion as to therelative document order of X and Y; X precedes, follows, or is the samenode as Y according to whether D=−1, 0, or 1. D may be returned as anadditional result of cmpordp (i.e., cmpordp may return the tuple (M, N,D), or D may deduced from M and N as described in the bullet listabove.) If, on the other hand, the identified byte in X is greater thanthe corresponding byte in Y, then the process continues to step 816,where N is set to 1 and D is set to 1. If the comparison proceedsthrough all B bytes without any difference having been found between Xand Y, then the process continues to step 814.

[0083] At step 814, if M=0, then the process continues to step 818,where N is set to 0 and D is set to 0. If M is not equal to 0, then theprocess continues to step 820, where certain bits of X and Y arecompared. Specifically, the last byte in either X or Y (whichever isshorter) is identified. Within this byte, the bits that precede “wasted”bits W are identified. Step 814 compares X and Y with regard to theseidentified bits. Since the bits to be compared may be less than one fullbyte, the comparison can be performed in practice by constructing a maskand performing a bitwise logical AND between the relevant byte and themask. If the identified bits are less in X than in Y, the processcontinues to step 822, where N is set to −1, and D is set to −1. If theidentified bits are equal in X and Y, then the process continues to step824, where N is set to 0, and D is set to the value of M. If theidentified bits in X are greater than the identified bits in Y, then theprocess continues to step 826, where N is set to 1 and D is set to 1.

[0084] When this process has been carried out, if N=0, then X is anancestor of Y if M<0, Y is an ancestor of X if M>0, and X is the samenode as Y if M=0. Moreover, X precedes Y in document order if D=−1, Yprecedes X in document order if D=1, and X is the same node as Y if D=0.

[0085] Alternative Representations of Position Identifiers

[0086] In some hierarchical structures, such as XML trees, it may occurthat a small numbers of nodes exist at very deep levels of nesting, andthus the position identifiers for these nodes might have overly longprimary keys. This would increase the size of secondary indexes that useprimary keys to identify rows indexed. An alternative positionidentifier called an “ORDKEY” may be used to reduce the size of positionidentifiers.

[0087] Such a reduced-size position identifier for an existing tree maybe created by passing through the nodes of the tree in pre-order, andgenerating only L₀/O₀ pair with ordinal values 1, 2, 3, for all nodes,regardless of ancestry. This numbering scheme preserves document orderbut ignores ancestry.

[0088] Later insertions within the tree will can be performed using avariant of the insertion technique described above—that is, a sequenceof nodes between nodes numbered 2 and 3 will have numbers 2.1, 2.2, 2.3,etc. It is not necessary to use the odd-and-even numbering schemedescribed above because the “flat” nature of an “ORDKEY” numberingscheme causes the sibling/child relationship to be undifferentiated, soeverything can be considered a sibling. Note that inserting a sub-treeof nodes in any order other than pre-order generates a multi-levelposition identifier (i.e., with many L_(i)/O_(i) values for a givenposition identifier). Flattening the subtree in this case requires localreorganization.

[0089] Since ancestry information is not implicit in an “ORDKEY”representation, a “last descendent” concept may be used to supplyancestry information to “ORDKEY”-labeled nodes. If the positionidentifier for a node N is represented by ID, and LD represents the“Last Descendent” of N, then a node M having position identifier ID*will be a descendent of N if and only if: ID<ID*<=LD. This is called the“LD property”. We would normally think of ID and LD as existing as aprimary key pair in the same node (ID is unique, but LD adds to theinformation carried in the primary key). This technique requiresmaintaining an LD value for each node. A preferred method for generatingLD values during a document order load of the nodes is described below.

[0090] In order to generate LD values, the tree is traversed in documentpre-order. Pre-order of the nodes of the tree is leftmost depth first,ascending after there is no lower leftmost node to go to, and thenadvancing to the next node in document order, which is still theleftmost lowest node still unvisited. The next available ID is assignedto the node as a “descendant limit” (DL) value as the tree is descendedto the right (skipping over DL values that have already been set). Wekeep track of the rightmost nodes at each level in the descending pathto the last (and deepest) one inserted—all these nodes have ID assigned,but have not yet been assigned DL values—all prior nodes not on therightmost path also have DL assigned. When we go UP in pre-order(because there are no more left-hand siblings below), we immediatelyassign the next ID value in order as a DL, and use that DL for the nodewe're leaving at the lowest level and every node we pass while going UP.As soon as we start down again (with a right-hand child of ahigher-level node reached while going UP), we skip over the DL we'vebeen using in order to assign the next ID.

[0091] At any time we need only keep a stack of nodes in the rightmostpath. These all have ID's assigned, but are not yet assigned DLs. Whenthe final rightmost node is placed in the document, the tree will onlybe ascended thereafter, generating a single DL that is used for everynode we pass going on the way up and finally for the root as well.

[0092] As the tree is traversed, an ID can be placed in all secondaryindex entries, and there may also be an index correlating ID to DL,which is indeterminate on the rightmost path at any time. The DL valuesfor the rightmost path ID's may be filled in as infinity, since there isnothing to the right of the rightmost path, and everything will work forrange searches.

[0093] The above-described technique can be generalized to subtrees,except that the DL for nodes having the subtree root on the rightmostdescendent path will be indeterminate as well, and will have to be resetwhen the subtree insert is complete.

[0094] The “ORDKEY” approach to position numbering can be combined withthe “ORDPATH” position numbering scheme described above in numerous waysto create hybrid architectures, obviating the need for determining aLast Descendent LD or Descendent Limit DL for each ORDKEY ID. Forexample, an “ORDKEY” identifier may be used as a Primary Key and providean parallel “ORDPATH” identifier in the node itself or as a node in anauxiliary table with the “ORDKEY” as the primary key, to be accessedwhen ancestry information is desired.

[0095] Other hybrid schemes may be created as well, and are within thespirit and scope of the invention.

[0096] It is noted that the foregoing examples have been provided merelyfor the purpose of explanation and are in no way to be construed aslimiting of the present invention. While the invention has beendescribed with reference to various embodiments, it is understood thatthe words which have been used herein are words of description andillustration, rather than words of limitations. Further, although theinvention has been described herein with reference to particular means,materials and embodiments, the invention is not intended to be limitedto the particulars disclosed herein; rather, the invention extends toall functionally equivalent structures, methods and uses, such as arewithin the scope of the appended claims. Those skilled in the art,having the benefit of the teachings of this specification, may effectnumerous modifications thereto and changes may be made without departingfrom the scope and spirit of the invention in its aspects.

What is claimed is:
 1. A computer-readable medium having encoded thereona data structure which represents hierarchically-organized data, saidhierarchically-organized data having at least a first node at a firstlevel and a plurality of second nodes at a second level, the secondnodes being child nodes of the first node, the first and second nodeseach having a corresponding data item associated therewith, the datastructure comprising: a plurality of rows each having a plurality offields, each of said rows corresponding to a data item associated with aone of the first and second nodes, the fields of each row comprising: afirst field which stores the data item associated with the one of thenodes that corresponds to the row; and a second field which stores aposition identifier which identifies the level at which the node thatcorresponds to the row is located in the hierarchically-organized data,and which further indicates one of: (a) an identity of an ancestor nodeof the node that corresponds to the row, or (b) the fact that the notethat corresponds to the row has no ancestor.
 2. The computer-readablemedium of claim 1, wherein the data structure comprises a relation in arelational database.
 3. The computer-readable medium of claim 1, whereinthe hierarchically-organized data comprises data in a hierarchicalmarkup language.
 4. The computer-readable medium of claim 3, whereinsaid hierarchical markup language comprises Extensible Markup Language(XML).
 5. The computer-readable medium of claim 4, wherein the fields ofeach row further comprise: a name identifier identifying a user-assignedXML name; and a data type.
 6. The computer-readable medium of claim 1,wherein the position identifier of the first node comprises a firstvalue in a space of ordered values, and wherein the position identifiersof each of the second nodes comprises said first value and a secondvalue in said space of ordered values.
 7. The computer-readable mediumof claim 6, wherein an order is defined among the second nodes, andwherein the second values associated with the second nodes arerespective of said order with respect to said space of ordered values.8. The computer-readable medium of claim 7, wherein said space ofordered values comprises the set of integers, wherein said first valueis a “1”, and wherein the second values for the second nodes areintegers in an increasing series of integers.
 9. The computer-readablemedium of claim 7, wherein said space of ordered values comprises theset of integers, wherein said first value is a “1”, and wherein thesecond values for the second nodes are odd integers in an increasingseries of integers.
 10. The computer-readable medium of claim 9, whereinsaid hierarchically-organized data comprises a third node which is achild of said first node and which is located between first and secondones of said second nodes, said third node having a position identifierassociated therewith, and wherein the position identifier for the secondnode comprises said first value, an even number between the secondvalues associated with said first and second ones of said second nodes,and an odd number.
 11. The computer-readable medium of claim 6, whereinsaid ordered values are represented in a form comprising: a lengthindicator selected from a length-indicator space having non-uniformnumbers of bits, each length indicator in said length-indicator spacehaving a prefix property such that no member of said length-indicatorspace is a prefix of any other member of said length-indicator space;and an ordinal indicator having a length indicated by said lengthindicator.
 12. The computer-readable medium of claim 11, wherein each ofthe position identifiers comprises a bit length field indicative of theaggregate number of bits for the length indicators and the positionidentifiers in the position identifier of the corresponding bit lengthfield.
 13. A method of representing hierarchically-organized data, thehierarchically-organized data comprising at least a first node at afirst level and a plurality of second nodes at a second level, thesecond nodes being child nodes of the first node, an order being definedamong the second nodes, the first and second nodes each having a dataitem associated therewith, the method comprising: assigning a firstposition identifier to the first node, wherein said first positionidentifier comprises a first value selected from an ordered space ofvalues; assigning a second position identifier to each of the secondnodes, each of the second identifiers comprising said first value and asecond value selected from said ordered space of values, wherein thesecond values are assigned to the second nodes respectively of theorder; and storing, in a non-hierarchical data structure, a plurality ofdata records, wherein each of the plurality of data records correspondsto one of the first or second nodes, and wherein each data recordincludes, for its corresponding node: the position identifier associatedwith the node; and the data item associated with the node.
 14. Themethod of claim 13, wherein said non-hierarchical data structurecomprises a relational database, and wherein each of said data recordscomprises a row of a relation in said relational database.
 15. Themethod of claim 13, wherein said hierarchically-organized data comprisesdata in Extensible Markup Language (XML) having a plurality of tags,each of said tags delimiting a portion of the hierarchically-organizeddata, said tags being nestable, and each of said tags and its delimiteddata corresponding to one or more nodes in the hierarchically-organizeddata.
 16. A system for storing hierarchically-organized data, thehierarchically-organized data comprising at least a first node at afirst level and a plurality of second nodes at a second level, thesecond nodes being child nodes of the first node, each of the first andsecond nodes having a data item associated therewith, the systemcomprising: a relational table having a plurality of rows, each of saidrows corresponding to a node in the hierarchically-organized data, therelational table having a plurality of columns, the columns comprising:a first column which stores a position identifier indicative of thelevel at which the node that corresponds to the row is located in thehierarchically-organized data, and further indicative of an ancestor ofthe node that corresponds to the row; and a second column which storesthe data item associated with the node that corresponds to the row; anda relational table manager which inserts and retrieves the rows from therelational database.
 17. The system of claim 16, wherein said relationaltable manager comprises a database management system.
 18. The system ofclaim 16, wherein the hierarchically-organized data comprises ExtensibleMarkup Language (XML) data.
 19. The system of claim 16, wherein said XMLdata comprises a plurality of tags which delimit portions of thehierarchically-organized data, said tags being nestable, wherein each ofsaid tags and its corresponding delimited data corresponds to one ormore nodes in the hierarchically-organized data, and at least some ofthe rows in said relational table further comprise: a tag identifierindicative of the tag associated with the node that corresponds to therow; and a type identifier indicative of a type of the data delimited bythe tag indicated by the row's tag identifier.
 20. A method of insertinga new node into a hierarchically-organized data structure, thehierarchically-organized data structure comprising at least a first nodeat a first level and a plurality of second nodes at a second level, thesecond nodes being child nodes of the first node, an order being definedamong the second nodes, each of the nodes having a data item associatedtherewith, the first node having a first position identifier whichcomprises a value selected from a space of ordered discrete values, thesecond nodes having second position identifiers, each of the secondposition identifiers comprising the first value and a second valueselected from a first subset of the space of ordered discrete values,the first subset consisting of non-adjacent values in the space ofordered discrete values, the second values assigned to each of thesecond nodes being respective of the order, the method comprising thesteps of: receiving the new node and an indication of a position in thehierarchically-organized data structure into which the new node is to beinserted, said position being located between first and second ones ofthe second nodes with respect to the order; and assigning the new node athird position identifier which comprises: said first value; a thirdvalue selected from a second subset of the space of ordered discretevalues, the second subset consisting of the difference between the spaceof ordered discrete values and the first subset, said third value beinglocated between the second values associated with said first and secondones of said second nodes with respect to the order; and a fourth valueselected from the first subset.
 21. The method of claim 20, furthercomprising: storing in a relation a row which comprises: said thirdposition identifier; and a data item associated with the new node. 22.The method of claim 21, wherein said relation further stores a rowcorresponding to each of the first and second nodes, each of the rowscomprising the position identifier of the corresponding first or secondnode, and the data item of the corresponding first or second node. 23.The method of claim 20, wherein said hierarchically organized datacomprises Extensible Markup Language (XML) data comprising a pluralityof tags which delimit portions of the hierarchically-organized data,said tags being nestable, wherein each of said tags and itscorresponding delimited data corresponding to one or more nodes in thehierarchically-organized data.
 24. A computer-readable medium havingcomputer-executable instructions, the instructions operating onhierarchically-organized data, the hierarchically-organized data havingat least a first node at a first level and a plurality of second nodesat a second level, the second nodes being child nodes of the first node,each of the first and second nodes having a data item associatedtherewith, the instructions being adapted to perform acts comprising:traversing the hierarchically-organized data in pre-order; assigning anidentifier to each of the first and second nodes in a sequencecorresponding to an order in which the first and second nodes areencountered during said traversing act; storing a plurality of recordsin a non-hierarchical data structure, each of said records correspondingto one of the first or second nodes, each of said records comprising:the identifier assigned to the node that corresponds to the row; and thedata item associated with the node that corresponds to the row;receiving a new node to be inserted into the hierarchically-organizeddata; assigning, to the new node, a new identifier which comprises theidentifier assigned to a one of the first and second nodes and a value;and storing, in the non-hierarchical data structure, a new recordcorresponding said new node, said new record comprising said newidentifier and a data item associated with said new node.
 25. Thecomputer-readable medium of claim 24, further comprising: receiving anindication that said new node is to be either a child or a sibling ofsaid one of said second nodes.
 26. The computer-readable medium of claim24, wherein said assigning act comprises assigning successive integersto each of the nodes encountered in said traversing act.
 27. Thecomputer-readable medium of claim 24, wherein saidhierarchically-organized data comprises a tree data structure.
 28. Thecomputer-readable medium of claim 27, wherein said tree data structurecomprises Extensible Markup Language (XML) data having a plurality ofportions delimited by a plurality of tags, each of the tags and itsdelimited data corresponding to one or more nodes in thehierarchically-organized data structure.
 29. A method of comparing therelative position of a first and second nodes in ahierarchically-organized data structure, the hierarchically-organizeddata structure comprising at least a first node at a first level, aplurality of second nodes at a second level, and a plurality of thirdnodes at a third level, a first group of the third nodes being childnodes of a first one of the second nodes, and a second group of thethird nodes being child nodes of a second one of the second nodes, anorder being defined among the first, second, and third nodes, the methodcomprising: assigning a first identifier to the first node, said firstidentifier comprising first value selected from a space of orderedvalues; assigning a plurality of second identifiers to the second nodes,the identifier for each of the second nodes comprising said first valueand a second value selected from said space of ordered values, saidsecond values being assigned to the second nodes respective of theorder; assigning a plurality of third identifiers to the first group ofthe third nodes, the third identifier for each of the first group ofthird nodes comprising said first value, the second value associatedwith the first one of the second nodes, and a third value selected fromsaid space of ordered values, said third values being assigned to thefirst group of third nodes respective of the order; assigning aplurality of third identifiers to the second group of the third nodes,the third identifier for each of the second group of third nodescomprising said first value, the second value associated with the secondone of the second nodes, and a third value selected from said space ofordered values, said third values being assigned to the first group ofthird nodes respective of the order; identifying a fourth and fifth nodefrom among the first, second, and third nodes; comparing correspondingportions of the identifiers of said fourth and fifth nodes until a pairof corresponding non-equal identifiers is encountered; determining thatthe encountered portion in the fourth node appears earlier in the spaceof ordered values than the corresponding encountered portion in thefifth node; and determining that the fourth node appears earlier in theorder than the fifth node.
 30. The method of claim 29, wherein saidspace of ordered values comprises the integers.
 31. The method of claim29, wherein the identifiers associated with each of the nodes comprisesone or more variable-length bit sequences, and wherein said comparingact comprises performing a bytewise comparison of the identifiersassociated with the fourth and fifth nodes.
 32. The method of claim 29,wherein each of the portions comprises a value in said space of orderedvalues.